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Description 

The present hvention relates to a method and apparatus 1or recommending items and. in partitajlar. to a n«ttiod 
and apparatus tor recommending items using automated collaborative fitlering and leature^ulded autotnated collab- 

5 orative finenng^^ ^ inionnation. as well as the number of goods and services, available 1o individuals is increasing 
exponentially. This increase in iteiro and inlomiation is occurring across all domains, e.g. sound recordings, restau- 
rants movies World Wide Web pages, ctothing stores, etc. An individual attempting to find useful information, or farced 
to de!:ide between competing goods and services, is often faced with a bewildering selection of sources and choices. 

10 Individual sampfing of all items, even in a particular domain, may be impossible. For example, sampling eyery 

restaurant ol a partlcularlype In New York City would tax even the most avid diner. Such a sampHng would niost likely 
be prohibitively expensive fo carry out. and the diner would have to sutler through many unenjoyable restaurants. 

In many domains, individuals have simply learned to manage infomation overload by relying on a form of gerianc 
referral system. For example, in the domain of movie and sound recordings, many individuals rely on reviews wntten 

IS by paid reviewers. These reviews, however, are simpVIhe viewpoint of one. or perhaps two. individuates and may not 
have a likelihood of correlating with how the Indhrldual w«l actually perceive the movie or sound recording. Many indi- 
viduals may rely on a review only to be disappointed when they actually sample the item. 

One method of attempting to provWe an efficient filtering mechanism is to use content-based fllteiing. The content- 
based filter selects items from a domain for the user to sample based upon correlatfons between the content of the 

20 itemandthe user's preferences. Ccntent-basedfilteringschemessuflerfromthedrawbackthatthe items to be selected 
must be of some machine-readable fonn, or attributes describing the content of the Item must be entered by tend. 
This makes content based filtering problematic for existing Hems such as sound recordings, photographs, art video, 
and any other physteal item ttet is not inherently machine-readable. While item attributes can be assigned by hand in 
order to alk>w a content-based search, for many domains of items such assignment is not practical. For example, n 

2S couldtakedecadestoenlBreventhemostrudimentaryattribulesforallaNraibblenetworktelevisionvrieoclipsbyhan^^ 
Perhaps more importantly, even the best content-based filtering schemes cannot provide an analysis ol the quality 
of a partfcular item as it would be perceived by a particuiar user, since quality is inherently subjective. So. while a 
content-based filtering scheme may select a number of hems based on the content of those Items, a content-based 
filtering scheme generally cannot further refine the list of selected items to recommend Hems that the individual will 

^"^e present Inventton, m one aspect, relates lo a method for recommending an item to a user whfch Ihe user fras 
not yet rated A profile for each user is stored in a memory, and each user profile includes ratings given to items by 
the user. A profile for each item is stored in a memory, and each Item profile includes ratings given to the ttem by the 

35 A set of similarity factors are calculated for each user, each similarity factor represents the degree of agreement 
between two users' opinions lor all items. Using these similarity factors, a set of neighborhg users is selected foreaoh 
user. The neighbor ing users are assigned a weight Using the weights and ths rat ngs given to Hems by the neighbonng 
users, a recommendation for anitem not yet rated by a user Is made. 

In some embodiments, the calculating step comprise receiving a rating from one of the users lor one ofthe items. 

40 Both the rating user's profile andlhe rated item's profile are updated with the received rating, and a new set of simrtarity 
factors are cafculated for the rating user. In other embodiments, ratings for items are received via a local area network 

or a Wide area network. . . - 

lnoneernbodiment.oncearatingtoran item isreceh^edfromauser. the rated item's profile IS retnevcdtodeiemilne 

other users that have also rated the item. A new plurality of similarity factors is cateuiated for the rabng user only with 

4s respect to the users that have also rated the Item. 

In certain embodiments, the similarity factors lor a parlicular user are cateuiated by retnevmg an item proTile. 
detemilning other users that have also rated the item, and cafculating a similarity factor between the particular user 
and each ol the other users that have also rated the Hem. These steps are repeated until all items rated by the particular 
user have been retrieved and all similarity factors for the user have been calculated. These steps, in turn, are repeated 

50 until similarity factors for all users have been calculated. 

In one embodiment, the similarity factors are cateuiated by subtracting the rating given to the item by each of We 
other users from the rating given to the item by the requesting user, squaring each rating difference, and dividing the 
sum of the squared differences by the number of other users that have also rated the item. 

In another embodmem. the similarity factors may be a Pearson r correlation coefflciem between users. In this 

ss embodiment, the similarity factor between user x and user y Is then given by dMdIng the sum of the products of the 
mean score given lo Hems by user x subtracted from the rating given lo each item by user x and the mean score given 
to items by user y subtracted from the rating given to each item by user y by the square root ol the product of two 
sums The first sum is the squared differences between the rating given by userx to each item and the mean score 
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given to all Items by user x. The second sum is the squared differences between the rating given by user y 1o each 
item and the mean score given by user y to all itenns. 

In other embodiments, users are selected as neighboring users when the similarity factor between two users is 
less than a predetermined threshold value. In still other embodiments, a weight. is assigned to neighboring users by 
5 subtracting, tor each neighboring user, the similarity tactor for that neighboring user from the predetemnined threshold 
value and dividing each difference by the predetermined threshold value. 

in some other embodiments, an item is recomnnended by predicting a rating tor each item not yet rated by a user. 
The predicted rating isarrived at by taking a weighted average of the ratings given to the items by the user's neighbonng 
users Items are then recommended to the user based on the predicted ratings. 
w In one embodiment, the user selects an item and a rating is predicted by taking a weighted average of the ratings 

given to the selected Item by the user's neighboring users. 

In certain embodiments, intorn^tfon about the recommended items can be provided on a deplay. 
In another aspect, the present invention relates to a method lor recommending an item to a user which has not 
yet been rated by the user, each item belonging to at least one group of items. A set of similarity factors for each user 
IS is calculated, representing the degree o1 agreement in item ratings between users within different groups. Neighbonng 
users are selected within each group, a weight is assigned to each of the neighboring users lor each group, and items 
are recommended based on the weights assigned to the user's neighboring users and the ratings given to the unrated 
item by the user's neighboring users. u u u 

in some embodiments, the similarity lactors are cafculated by retrieving the item profile for an ttem that has been 
20 rated by a user, determining other users that have also rated the item, calculating a simHarity tactor between the user 
and the other users that have also rated the Item, and repeating these steps until all Items rated by the user In a group 
have been retrieved. These steps, in turn, are repeated until similarrty factors for each user have bean calculated. 
In one embodiment, the srmilarily factors are calculated only using ratings for other items betonging to the same 

. 2B ^^°"|n other embodiments, each selected neighboring user has a similarity factor less than a predetermined threshokl 
value and neighboring users are assigned weights by subtracting, lor each neighboring user, the similarity factor for 
that neighboririg user from a predetermined threshold value and dividing each difference by the predetermined thresh- 
old value. L.» ^ 
in still other embodiments, a rating is predtcled tor an item in a group not yet rated by a user by taking a weighted 
30 average of the ratings given to the Items in the group by the user's neighboring users, and recommending a predeter- 
mhed number ol items from the group based on the predicted ratings for those Hems. . . . ^ 
An Hem may be selected by the user, in which case a rating is predbted tor the selected item by taking a weighted 
average of the ratings given to the selected item by the user's neighboring users withn the group. The user may also 
select a particular group for which to receive recommendations, in which case, a rating is predicted for items in the 
35 selected group by taking a weighted average of the ratings given to the items in the group by the user's neighbonng 
users for th^ group: A predetermined number of items are recommended based on their predbted rating. 

In certain embodiments, items betong to more than one group. In other certain embodiments, information about 
recommended Itenrw Is provided on a display 

In another aspect, the present invention relates to a method for recommending other users to a user. User profiles 
40 are stored in a memory tor each user, and each user profile Includes ratings given to items by the user. Item profil^ 
are also stored in a memory for each of the items, and each of the items belongs to one of a plurality of groups. Each 
item profile includes a plurality of ratings given to the ilem by one of the users. 

Similarity factors are calculated for each user, each similarity factor representing the rating similarity between each 
user and another one of the users for a particular group. Neighboring users are recommended to a user based on the 

45 slmlterlty factors. . , . ...u u «f 

In a certain embodiment, a neighboring user is recommended based on the similarily factors and the number of 

(terns rated by both users. . u. • u • fi f 

In another aspect, the present inventfon relates to a noethod for recommending an item, the item havng at least 

one feature defining a characteristte of the item and having more than one possible value. The possible values may 
50 be grouped into cluster of feature values. User profiles are stored in a memory for each user; and the user profiles 

include ratings given to items by each user. Item profiles are also stored in a memory for each of the items, and the 

Item profiles 'nclude ratings given to each item by the users. 

A weight is assigned to each cluster of feature values for each user. A weight is also assigned to each feature tor 

each user. Using the feature weights, the feature value cluster weights, and the ratings given to items by ^he users, a 
55 set 01 similarity factors Is calculated for each user. For each user, a set of neighboring users Is selected respcffislve to 

the similarity factors; a weight is assigned to each ol the neighboring users, and items are recommended to a user 

based on the weights assigned to the user's neighboring users and the ratings given to the unrated item by the user's 

neighboring users. 
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In some embodiments a weight is assigned, for each user, to each value cluster based on the rating given to the 
item by the user and the number ol leature value clusters preserrt. 

In other embodimenls. a weight is assigned, for each user, to each feature. This can be done based on the number 
of features defined tor the Item, or a leature weight may be assigned by based on the weights asslgnedlo each feature 
value cluster. In particular, the feature weights can be assigned by divldingihe standard deviation of the cluster weights 
by the mean of the Cluster weights. „„tK-,te«i„r« 

In stlll other embodimenls. a similarity factor Is calculated for each feature value with respect to all other feature 
values and the feature values are grouped based on the feature value similarity fadois. 

In another aspect, the present invention relates to a method for recommending an hem to a user, the item having 
at least one feature defining a characterislic of the item and having a plurality of possible values that may be grouped 
Into clusters ol feature values. User profiles are stored In a memoiy for each user, and the user profile Includes ratings 
given to items by Ihe user. Item profiles are stored in a memory lor each items, and the item profile ncludes ratings 
given to the Item by the users. ^ 

A weight is assigned to each leature value cluster for each user. A weight is assigned to each leature for each 
user Using the feature weights and the feature value cluster weights, similarity lactors between items are caloulatad 
lor a particular user. An item lor which a lavorable rating has been received from the user is selected, and a number 
of Hams are recommended to the users responsive to the item similarity factors. 

In another aspect, the present invention relates to an article d manufacture having program means tor recom- 
mendino an item embodied thereon. "The article includes computer-readable program means for stonng a user profl e 
in a memory; the user profile includes ratings givento the iterrw by the user. The article also includes computer-readable 
program means lor storing an item proflie In a memory; the Item profile Includes ratings given to the Item by the users. 

Also included are computer-readable program means tor cateulaling similarity factors for each user, each ol the 
similarity factors representing the similarity between each user and another one Of the users, and computer-readable 
program means lor selecting neighboring users lor each user responsive to the similarity lactors. ^ 

The article also includes computer-readable program means lor assiyiing a vwighl to each of the neighboring 
users, as well as computer-readable program means for recommending an Item to one of the users based on the 
weights assigned to the user^ nei^boring users and the ratings given to the unrated item by the user's neighbonng 

In yet another aspect, the present invention relates to an article of manufacture frwiuding computer-readable pro- 
gram means for predicting a rating tor an item, the item having at least one feature defining a charactenstic of the item 
and having a plurality of possible values that may be grouped Into clusters erf feature values. 

Also included on the article are: computer-readable program means for storing a user profile in a memory for eacsh 
user, each user profile including ratings given tothe of items by the user, computer-readable program means lor stonng 
an item profile in a memoiy for each item, each item profile Including ratings given to the item by the users; cortputer- 
readable program means tor assigning, lor each user, a weight to each value cluster within each feature; computer- 
readable program means tor assigning, for each user, a weight to each feature; computer-readable program rneans 
tor calculating a set of similarity factors tor each user, each similarity factor based on the feature weights, the cluster 
weights, and the ratings given to items by the respective users; computer-readable program means tor selecting, lor 
each user a set of neighboring users responsive tothe similarity lactors; computer-readable program means for as- 
signing a weight to each of the neighboring users; and computer-readable program means tor recommending an itern 
to one of the users based on Ihe weights assigned to the user's neighboring users and the ratings given to the unrated 
item by the usei*s neighboring users. 

in another aspect, the present invention relates to an apparatus tor recommer>dln0 an item to a user which has 
not yet rated 1he itera The apparatus has a memory element tor storing user profiles; each user profile includes ratngs 
given to Items by the user. A memoiy element lor storing Bem profiles Is also included; each Item pnsfile Includes ratings 

given to the Hem by the users. ^ . ^ . .. ^.4^^«. 

The apparatus also includes means tor calculating similarity lactors tor each user, each of the similanty lactors 
representing the simnarity between each user and another one of the users, and means for selecting, lor each of the 
users, a sol ol neighboring users responsive to the similarity lactors. 

The apparatus also has means for assigning a weight to each of the neighboring users; and means for recom- 
mending at least one ol the Hems to one of the users based on the weights assigned to the user's neighborng users 
and the ratings given to the unrated item by the user's neighboring users. 

In yet another aspect, the present invention relates to an apparatus for recommending an item to a user, the item 
having at least one leature deflnlng a characteristic of the hem and having many possible feature values that may be 
grouped The apparatus Includes a memoiy element for storing user prc>flles; each user profile Includes ratings given 
to the items by the user. A memory element for storing item profiles is also included; each item profile includes ratngs 
given to the item by the usere. Means tor ^signing a weight to each value cluster within each feature is also provdod. 

The apparatus also includes: means for assigning, for each user, a weight 1o each feature; means tor calculating 
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similarity iaclors for each user, each of the similarity factors based on the ieature weights, the cluster weights, and the 
ratings given to items by the respective users; means for selecting, for each of the users, a plurality of neighboring 
users responsive to the similarity tactors; means lor assigning a weight to each of the neighboring users; and mear^s 
lor recommending an hem to one ol the usere based on the weights assigned to the user's neighboring users and the 
S ratings given 1o the unrated item by the user's neighboring users. 

Certain embodiments of the invention will now be described by way of example and with reference to the accom- 
panying drawings, in which: 

FIG. 1 is a flowchart of one enrtbodiment of the method of the present invention; 
W FIG. 2 is a flovrchart of one embodiment of the method of the present invention; 

FIG. 3 Is a block diagram of an embodiment of the apparatus of the present Invention; and 

FIG. 4 is a block diagram of the Internet system on which the preterred methods and apparatus may be used. 

As relerred 10 in this description, Items to be recommended can be items of any type that a user may sample, such 
IS as sound recordings, movies, restaurants, vacation destinations, novels, or Wbrld Wide Web pages. When reference 
is made to a 'domain ; it is intended to refer to any category or subcategory of ratable items, such as sound recordings, 
movies, or restaurants in a partfcular city. Referring now to FIG. 1 , a method tor recommending Items begins by storing 
user and item Infomr^tion in profiles. 

A plurality of user profiles is stored in a memory element (step 102). It is preterred that one profile is created for 
20 each user, however, multiple profiles may be created lor a user to represent that user over multiple domains. The 
memory element can be any memory element known In the art that Is capable of storing sufficient data forthe plurality 
of profiles and allowing the profiles to be updated, such as disc drive or random access memory 

Each user profile associates ratings given to an item by the user with a particular ftem. User prrfiles can be any 
data construct that facilitates this association, such as an array, allhou^ H is preterred to provkla user profiles as 
25 sparse vectors of ordered pairs. Each ordered pair contains a number representing the rated item and a number rep- 
resenting the rating that the user gave to the item. . 

In the prefen-ed embodiment, a profile for a user is created when that user first begins rating items, although in 
muItlKiomain applications user profiles may be created for particular don^lns only when the user begins to explore, 
and rats items within, those domains. Whenever a user profile is created, a number of ratings for items are soliciled 
30 irom the user. This can be done by providing the user with a particular set of items to rate corresponding to a particular 
group of Items. Groups are genres of items and are discussed below in more detail. Other methods of soliciting ratings 
trom the user include: manual entry of item-rating pairs, in which the user simply submits a list of items and ratings 
assigned to those items; soliciting ratings by date of entry into the system, i.e. . asking the user to rate the newest lt»ns 
added to the system; soliciting ratings of the most rated items; or by allowing a user to rate items similar to an inital 
3s item selected by the user. The prefen^ed embodiment uses an of the methods described above and altows the user to 
select the particular method they wish to employ. 

Profiles are stored for each item that has been rated by at least one user (step 104). Each item profile records 
how particular users have rated this paitfcular Rem. Any data constmct that associates ratings with certain users can 
be used. It \s preferred is to provide item profiles as a sparse vector of ondered pairs. Each ordered pair contains a 
^ number representing a partteular user and a number representing the rating that user gave to the item. Item profiles 
are created when the first rating is given lo an ftsm. Although FIG. 1 shows item profiles being stored after user profiles, 
the two may be stored in any order, and can occur simultaneously. 

A similarity factor for each user is calculated vnth respect to all other users (step 106). and represents the degree 
of similarity between any two users with respect lo all items. II is currently preferred that the more similar two users 
45 are, the closer the similarity factor Is to zero. Specialized hardware may be provided tor calculating the slmlterlty factors 
between users, although it is preferred lo provide a general-purpose computer with appropriate programming to cal- 
culate the similarity factors. . 

The similarity lactor may be calcubted by comparing a user's profile with the profile <rf every other user. The is 
computationally intensive, however, and it is preferred to calculate the similarity factor by first examining the profiles 
so of each item for which a user has entered a rating and detemnffiing whkdi other users have also rated each Item. The 
similarity factors between the user and the other users that have also rated the items are the only similarity factors 
updated. These steps are repealed tor all users. . 

Ffatings for items are received trom users. Item ratings can be ol any form that allows users to record subiective 
impressions of items based on their experience of the item. For example, items may be rated on an alphabetfc scale 
55 ("A" to "F") or a numerical scale (1 to 10). It Is curremiy preterred That ratings are Integers between 1 (bwesQ and 7 
(highest). Ratings can be received as input to a stand-alone machine, as Input to a system via electronic mail, or as 
input to a system via a local area or wide area network. In the currently preterred embodiment, ratings are received 
as input to a World Wide Web page. Ratings can be received from users singularly or in batches, and may be received 
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1rom any number of users simuUaneousty. 

Whenever a rating is received 1rom a user, the profile or| the user who entered the rating must be updated as well 
as the profile of the item rated by the user. Profile updates may be stored in a temporary memory location and entered 
at a convenient lime. It Is preferred, however, to update profiles whenever a new rating Is entered by a user. Ratings 
S are updated by appending a new pair of values to the set of already existing ordered pairs in the profile. 

Whenever a user's profile is updated because a rating has been received from the user, new similarity factors 
between the user and all other users ol this system must be calculated. This can be done by comparing the ratings 
given to items by the user to the ratings given 1o items by all other users. However, it is currently preferred to reduce 
the amount of compirtaiion necessary in the following manner. Whenever a user enters a rating for an Item, that item*s 
W profile is retrieved and the identity of other users ihaX have also rated the item is determined. In this way, only the 
similarity factors between the rating user and other users that have also rated the Item are updated. 

Any number of methods can be used to cabulate the similarity factors. In general, a method for calculating similarity 
factors between users should minimize the average error between a predicted rating for an item and the rating a user 
would actually have given the item. 
IS It is also desirable to reduce error in cases involving 'BXtreme* ratings. That is, a method which predicts fairly well 

lor item ratings representing ambivalence towards an item but which does poorly for Item ratings representing extreme 
enjoyment or extreme disappointment with an item is not useful tor reconrvnending itenr^s 1o users. 

Similarity factors between users refers to any quantity which expresses the degree of correlation between two 
user's profiles. The following methods for calculating the similarity factor are intended to be exemplary, and in no way 
£0 exhaustive. Depending on the rtem domain, different methods will produce optimal results, since users in different 
domains will tolerate different expected errors. 

In the following description of methods, represents the similarity factor calculated between two users, x and 
y H-K represents the rating gVen to item 1 by user x. I represents all Items in the database, and Is a Boolean quantity 
which is 1 if user x has rated item i and 0 if user x has not rated that item. 
2B One method of calculating the similarity between a pair of users is to cabulale the average squared difference 

between their ratings for mutually rated itenrw. Thus, the similarity factor between user x and user y is calculated by 
subtracting, for each item rated by both userB, the rating given to an Hem by user y from the rating given io that earns 
Item by user x and squaring the difference. The squared differences are summed and divided by the total number of 
items rated. This method is represented mathematically by the following expression: 



30 



3S 



40 



4S 



A similar method of calculating the snnilarity factor between a pair of users is to divide the sum of their squared 
rating diflerences by the number of Items rated raised to a power greater than 1. This method Is represented by the 
folbwing mathematical expression: 



D ='-^ 



where IC^I represents the number of itenns rated by both users and k is greater than 1. 

A third method for calculating the similarity factor between users attempts to factor into the calculation the amount 
so of overlap between two user profiles. Thus, for each item rated by both users, the rating given to an rtem by user y is 
subtracted from the rating given to that same item by user x. These diflerences are squared and then summed. The 
amount of profile overlap is taken into account by dividing the sum of squared rating diflerences by a quantity equal 
to the number of items mutually rated by the users subtracted from the sum of the number ol items rated by userx and 
the number d hems rated by users y. This method is expressed nathemaiically by: 



6 



EP 0 751 471 A1 




s 



where IC^I represents the number of ilerrw mulually rated by users x and y. 

In another embodiment, the similarity lactorbetweeniwo users is a Pearson r correlation coefficient. Alternatively. 

w the Bimilar'rty iactor may be calculated by constraining the con-elalion coefficient with a predetermined average rating 
value, A. Using the constrained method, the con-elatbn coetflclent. which represents Is arrived at In the lollowing 
manner. For each item rated by both users. A is subtracted from the rating given to the item by user x and the rating 
given to that same item by user y. Those differences are then multiplied. The sunrvnad product of rating diflerences is 
divided by the product of two sums. The first sum is the sum of the squared differences of the predefined average 

IS rating value, A, and the raling given to each ilem by user x. The second sum is the sum of the squared diflerences of 
the predefined average value. A, and the rating given to each item by user y. This method is expressed mathematically 
by: 



where represents all rtems rated by x, Uy represents all items rated by y. and C,^ representB all itamG rated by both 
X and y. 

Regardless of the method used to generate them, the similarrty factors are used to select a plurality of users that 

30 have a high degree of correlation lo a user (step 10B). These users are called the user's "neighboring users." The 
neighboring users are selected from all other users based on having a similarity factor with respect to the requesting 
user less than a predetermined threshold value, L The threshold value, L, can be set to any value which improves the 
predictive capability of the method. In general, the value of L will change depending on the method used to calculate 
the similarity factors, the item domain, and the size of the number of ratings that have been entered. 

35 A weight is assigned to each of the neighboring users (step 110). In the prelen-ed embodiment, the weigftts are 
assigned by subtracting the similarity factor calculated for each neighboring user irom the threshold value and dividing 
by the Ihreshold value. This provides a user weight that is higher, i.e. closer to one, when the similarity factor between 
two users Is smaller. Thus, similar users are weighted more heavily than other, less similar, users. 

Once weights are assigned lo the neighboring users, an Hem is recommended to a user (step 11 2). For applications 

^ in which positive item recommendations are desired, items are recommended r! the user's neighboring users have also 
rated the item highly. For an application desiringio warn users away from hems, rtems are displayed as recommended 
against when the user's neighboring users have also given poor ratings to the item. Once again, although specialized 
hardware n^y be provided to select neighboring users and weight them, it Is currently prelerred to provide an appro- 
priately programmed general-purpose computer to provide these fur^ctions. 

45 The Item to be recommended may be selected in any fashion, so tong as ihe ratings of the nel^borlng users and 
their assigned weights are taken into account. In one embodiment, a rating is predbted lor each item that has not yet 
been rated by the user This predicted rating is arrived at by latdng a wei^ted average of tiie ratings given to those 
items by the user's neighboring users. A predetermined number of items are then recommended to the user based on 
the predicted ralings. 

so The predetermined number of rtenns can be selected such that those Items having the bluest predicted raling are 
recommended to the user Aftematively, the predetermined number of items may be selected based on having the 
lowest predicted rathg of all Ihe items. 

In another embodiment the user selects an item for which a predicted rating is desired. A raling is predicted by 
taking a weighted average of the ratings given to that item by the user's neighboring users. 

55 Whatever method Is used, intormatlon about the recommended Items can be displayed to the user. For example, 
in a music domain, the system nr^y display a list of recommerxJing albums including the name oi the recording artist, 
the name of the album, the record label which made the album, the producer of the album, Tilt" songs on the album, 
and other information. In the embodiment in which the user selects an item and a rating is predicted for that item, the 
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Eystem may display the actual rating predicted, or a label representing the predicted rating. For example, instead of 
displaying 6.B for the predicted rating, a system may instead display "tilghly recommended'. 

In one embodiment, items are grouped in order to help predict ratings and increase reccmnendation certainty. 
For example, In the broad domain of music, recordings may be grouped according to various genres, such as •opera» 
S ' "pop," Vock," and others. Groups are used to improve performance because predictiorvs and recommendations for 
a particular item are made based only on the ratings given to other items within the same group. Groups way be 
determined based on Informalion entered by the users, however it is currently preferred to generate the groups using 
the item data itself. 

Generating the groups using the item data itself can be done in any manner which groups items together based 
10 on some differentialing feature. For example, in the item domain of music recordings, groups could be generated 
corresponding to "pop," "opera," and others. 

In the preferred embodiment, ilem groups are generated by, first, randomly assi^ing all items in the database to 
a number of groups. The number of desired groups can be predetermined or random. For each initial group, calculate 
the centroid ot the scores lor items assigned to that group. This can be done by any method that determines the 
IS approximate mean value of the spectrum of scores contained in the item profiles assigned to the initial group, such as 
eigenanalysls. It is currently preferred is to average all values present in the initial group. 

After calculating the group centroids, delsrmine to which group centroid each item is closesl, and move it to that 
group. Whenever an Hem is moved in this manner, recalculate the centroids for the affected groups. Iterate unlfl the 
distance between all group centroids and Hems assigned to each group are bebw a predetermined threshold or until 
20 a certain number of iterations have been accomplished. 

A method using grouping to Improve performance calcu lates similarity factors lor a user with respect to other users 
in a particular group {step 106). For example, a user may have one similarity lactor wrth respect to a second user for 
the 'pop' grouping of music Items and a second similarity factor with respect to that same user for the 'opera' grouping 
of music items. This is because the "pop* similarity factor is calculated using only ratings for "pop" items, while the 
^ "opera" similarity factor is calculated only for "opera" items. Any of the methods described above for calculating simibrily 
factors may be used. 

The neighboring users are selected based on the similarity factors (step 108). The neighboring users are weighted, 
and recommendations for Itenns are arrived at (steps 110 and 112) as above. A wel^ted average of the ratings given 
to other items in the group can be used to recommend items both inside the group and outside the group. For example, 

30 if a user has a high correlatfon with another user in the 'pop' grouping of music items (the similarity factor between 
the users is close to 0), that similarity factor can be used to recommend music items inside the "pop" grouping, shce 
both users have rated rrany items in the group. The similarity factor can also be used to recommend a music item 
outside of the group. If one of the users has rated an ilem in another group. Alternatively, a user rr^y select a group, 
and a racommendalion list will be generated based on the predicted rating for the user's neighboring users in that group. 

3S Whether or not grouping Is used, a user or set or users rnay be recommended to a user as similar in taste. In this 
case, the similarity factors cafoulaled from the user profiles and item profiles are used to match similar users ^^d 
introduce them to each other. This is done by recommending one user to another in much the same way that an item 
Is recommended to a user. It Is possible to Increase the recommendation certainty by Including the number of Items 
raled by both users in addition to the similarity factors calculated for the users. 

40 Grouping is a specal ease of 'leature-guided automated collaborative filtering" when there is only one feature of 
interest. In the example above, the feature of interest was genre of music. The method of the present invention worlcs 
equally well for item donrains in which the items have multiple features ol interest 

The method using feature-guided automated collaborative filtering incorporates feature values associated with 
items in the domain. The term feature value" is used to describe any information stored about a particular feature of 

45 the Item. For example, a feature value may have boolean feature values Indicating whether or not a particular feature 
exists or does not exist in a particular item. 

Alternatively, features may have numerous values, such as terms appearing as "keywords" in a document. In some 
embodiments, each feature value can be represented by a vector in some metric space, where each term of the vector 
corresponds to the mean score given by a user to items having Ihe foalura value. 

50 Ideally, it is desirable to calculate a vector of distances between every pair of users, one for each possible feature 
value defined for an item. This rr)ay not be possible if the number of possible feature values is very large, i.e.. keywords 
in a document, or the distribution of feature values is extremely sparse; Thus, in many applications, it is desirable to 
cluster feature values. The terms "cluster" and "feature value cluster" are used to indicate both individual feature values 
as well as feature value clusters, even though feature values may not necessarily be clustered. 

55 Feature value clusters are created by defining a distance function A, defined for any two points In the vector space, 
as well as vector combination function Q, which combines any two vectors in the space to produce a third poH in the 
space that in some way represents the average ol the points. Although not limited to the examples presented, three 
possible formulations of A and Q are presented below. 
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The notion of similarity between any two feature values is how similarly they have been rated by the same user, 
across the whole spectrum of users and items. One method o1 defining the similarity between any two feature values 
is to take a simple average. Thus, we define the value v^* 1o be the mean o1 the rating given to each item containing 



leature value FVJ that user I has rated. Expressed mathematically. 



10 



Undefined 



otherwise 



IS 



20 



Where r^^ Indicates the presence or absence of feature value FV^ in item p. Any distance metric way be used to 
determin§ the per-user dimension squared distance between vectors feature value Oj^ and feature value Oy for user i. 
For example, any of the methods referred to above for calculating user similarity may be used. 

Defining 5 as the per-user dinnenslon squared distance between two feature values, the total distance between 
the two feature value vectors is expressed mathematically as: 



2S 



|C/m«|| 

1=1 ' / ' / 



30 



35 



where, the term 



40 



represents adjustment lor missing data. 

The combination function for the two vectors, which represents a kind of average for the two vectors, Is expressed 
mathematically by the lolbwing three equations. 



4S 



I 



7 //;;7 = l and 777=1 
ifTiy^\ and 77^=0 
ifriy^^O and 777=1 



so 



55 



wherein indicates whether v ^ is defined. 

Another method for cateulating the similarity between any two feature values ie to assume the number of values 
used to compute v"' sufTicienlly large. II this assumption is made, the Central Limit Theorem can be used I0 justify 
approximating the distribution of vectors by a Gaussian distributioa 

Since the Gaussian distributbn can be effectively characterized by its mean, variance and sample size, each entry 
v"jf Is now a triplet 
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where 



10 



IS 



20 



is the sample mean of the population. 



2S is the variance of the sampling distribution, and 



30 



is the sample size. 

The total distance between the twoleature value vectors is expressed mathematically by: 



35 



40 



a(fk/./^;)=, 



The leature value combination function combines the corresponding triplets from the two vectors by treating them 
45 as gausslans, and therefore Is represented mathematically by: 



so 



n{Fv:,Fv;)= 



< fi]"' . Ny > if T/r- = l and . 77;' = I 
< yu'- ,(T»?' .Nr- > if r?r- =^ 1 and 77^' = 0 
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where 
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represents the mean of the new population, 



w 



15 



{Nf +k' ^<'">'] Nf +N:') -Ml') ^ 



represents the variance of the combined population, and 



20 



represents the sample size of the combined population. 

The third method of calculating feature value similarily metrics attempts to take into account the variance o1 the 
25 sampling distribution when the sample size of the population is small. A more accurate estimator of the population 
variance is given by the term 



30 



Z'r'((^.-A-;-)'-<c,,>r;') 



and represents the sample variance, which is an accurate estimator ctf the underlying population variance. 
Accordingly operator t|^-^ is redefined as: 



40 



45 



so 



and the triplet is defined as: 



= < _ . 



0 Othenvise 



Given the above, the sample variance is represented as: 



5S 



. i:'r'(K.-^-)'"..'T;.) 
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The sample varknce and Ihe variance of Ihe sample distribution for a finite population are related by the following 
relationship: 



to which translorms the standard deviation into: 



IS 



cr I - 



n;- - 1 



NJ'- 1 



20 



SB 



Thus, the feature value vector combinatton functbn is defined as: 



<//;',5=r',i/;'> if ^7"= land 7?;' 



\ 
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I 
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40 



45 



so 
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Regardless of the feature value combination function used, the item similarity metrics generated by them are used 
10 generate feature value clusters. Feature value clusters are generated fronri the item similarity metrics using any 
clustering algorithm known in the art. For example, the method described above with respect to grouping items could 
be used to group values within each leature. 

Feature values can be clustered both periodically and incrementally. Ir»cremental clustering Is necessary when the 
number of feature values lor itsms is bo large that reclustering of ail feature values cannot be dona conveniently 
However, incremental clustering may be used for any set of items, and It is preferred to use both periodic reclustering 
and incremental reclustering. 

All leature values are periodically reclustered using any clustering method krown in the art, such as eigenanalysis. 
It is preferred thai thts te done Infrequently, because of the time that may be required to complete such a reclustering. 
In order to cluster new feature values present in items new to the domain, leature values are incrementally clustered. 
New feature values present inthe new items are clustered into the already existing feature value clusters. These feature 
values may or may not be reclustered into another feature value cluster when the next complete reclustering is done. 

Using the leature value clusters generated by any one of the methods descrtoed above, a method lor reconnmend- 
ing an item, as shown in FIG. 2. uses feature clusters to aid in predicting ratings and proceeds as the method of FIG. 
1. in that a plurality of user profiles is stored (step 102") and a plurality of item profiles are stored (step 104*). The 
rnethod using feature value clusters assigns a weight to each feature value cluster and a weight to each feature based 
on the users rating of the item (steps 120 and 122). 

A leature value cluster weight for each cluster is calculated for each user based on the user's ratings of Items 
containing that cluster. The cluster weight is an indication of how important a particular user seenns to find a particular 
feature value cluster For example, a leature lor an item In a music donnain might bo the identity of the producer, if a 
user rated highly all items having a particular producer (or cluster of producers), then the user appears to place great 
emphasie on that particular producer (feature value) or clueter of producers (leature value clueter). 

Any method of assigning leature value cluster weight that takes into account the user's rating of the item and the 
existence of the feature value cluster for that item is sufficient, however, it is currently preferred lo assign leature value 
cluster weights by summing all of the item ratings that a user has entered and dividing by the number of feature value 
clusters. Expressed mathematically, the vector weight tor cluster x of feature a tor user I Is: 
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0,0 



otherwise 



to 



IS 



where 7*=^ is a boolean operator indicating whether item p contains the feature value cluster x 0I feature a. 

Theleature value cluster weight Is used, In turn, 10 define a feature weight. The feature weight reflects the Impor- 
tarxjo of that feature relative to the other features for a particular feature. Any method of estimatBng a feature weight 
can be used; tor example, feature weights may be defined as the reciprocal 0I the number of features defined for all 
items. It Is preferred that feature weights are defined as the standard deviation of all cluster weight divided by the 
mearts 0I all cluster weights. Expressed mathematically: 



20 



StandardDev^CFT/ 
Mean! CWi \ 



30 



3S 



The feature value cluster weights and the leature weights are used to calculate the similarity factor between two 
users. The similarity factor between two users may be calculated by any method that lakes into oxount the assigned 
weights. For example, any of the methods tor calculating the similarity between two users, as described above, may 
be used provided they are augmented by the leature weights and feature value weights. Thus 



FW, X. 



represents the Eimilarity between users I and J, where 



40 



4S 



is a boolean operator on a vector of values indicating whether feature value cluster of x tor leature a of the vector is 
defined and where 



$0 



0.0 



XT^;- >i 



otherwise 



55 The representation of an item as a set of feature values albws the application of various feature-based simllartty 
metrics between items. Two items may not share any identical feature values but still be considered qurla similar to 
each other if they share some feature value clusters. This allows the recommendation of unrated items to a user based 
on the unrated Hems similarity to other items which the user has already rated highly. 
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The similarity between two Hems Pi and P2. where and Pg represent Ihe corresponding sets o1 ieature values 
possessed by these items, can be represented as some fun ct Ion » 1, of the following three sets: the number of common 
feature values shared by the two items; the nunt)er of feature values that p^ possesses that P2 does not; and the 
number ot leatu re values thai Ps possesses that p^ does not. 

Thus, the similarity between two items, denoted by S(p-,, P2)» is represented as: 

Each item is treated as a vector of feature value clusters and the item-item similarity metrics are defined as: 

\F9ctunt Deftnedl |a] 
«=1 o^=l 



This metric Is personalized to each user since the feature weights and cluster weights reflect the relative importarwie 
of a particular feature value to a user. 

ATkOther method of defining item-item similarity metrics attempts to take into account the case where one pair ot 
items has numerous identical feature values, because if two items share a number of identical feature values, they are 
more similar to each other then two Items that do not share feature values. Using this method. t{P^^P^ is defined as: 



|P««mtr«fDefinod| |a| )r^\ 

Another method for calculatkig item-item similarity is to treat each item as a vector of feature value clusters and 
then compute the weighted dot product of the two vectors. Thus, 



where 



The methods described above can be provided as software on any suitable mediumlhat is readable by a computing 
device. The software progranns msaris may be implemented in any suitable language such as, C, C++, PEF^, USR 
ADA, assembly language or machine code. The suitable media may be any device capable d storing program means 
in a computer-readable fashion, such as a floppy disk, a hard disk, an optical disk, a CD-ROM, a ma^etfc tape, a 
memory card, or a removable magnetic drive. 
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An apparatus may be provided 1o recommend items to a user. The apparatus, as shown in FIG. 3 has a memory 
element 12 tor storing user and hem profiles. Memory element 12 can be any memory element capable of storing the 
profiles such as, RAM, EPROM, or rregnelic media. 

A means 14 for calculating Is provided which calculates the similarity factors between users. Calculating means 
14 may be specialized hardware to do the cabulalion or. allernatively, calculaling means 1 4 may be a microprocessor 
or software running on a microprocessor resident in a general-purpose computer. 

Means 16 tor selecting is also provided to select neighboring users responsive to Ihe similarity factors. A^n, 
. specialized hardware or a microprocessor may be provided to implement the selecting means 16, however prefen-ed 
is to provide a sotlware program running on a microprocessor resident in a general-purpose computer. Selecting means 
16 may be a separate microprocessor from calculaling means 14 or it rnay be the same microprocessor. 

A means 1 8 for assigning a weight to each of the neighboring users is provided and can be specialized hardware, 
a separata microprocessor, the same microprocessor as calculaling means 14 and selecting means 16. or a micro- 
processor resident in a general-purpose computer and mnning software. 

In some embodiments a receiving means is included in the apparatus (not shown in FIG. 3). Receiving means Is 
any device which receives ratings for items from users. The receiving means may be a keyboard or mouse connected 
to a personal computer. In some embodiments, an electronic mail system operating over a local are network or a wkte 
area network forms the receiving means. In the preferred embodiment, a World Wide Web Page connected to the 
Internet forms the receiving means. - ^ ^ 

Also included in the apparatus is means 20 for recommending at least pne ol the items to the users based on the 
weights assigned to the users, nei^boring users and the ratings given to the item by the users' neighboring users. 
Reconwiendatton means 20 may be specialized hardware, a microprocessor, or, as above, a microprocessor running 
Eof hvare and resident on a general-purpose computer. Recommendation means 20 may also comprise an output devtce 
such as a display, audio output, or prhted output. 

In another embodiment an apparatus for recommending an item is provided that uses feature weights and feature 
value weights. This apparatus is similar to the one described above except that it also includes a means for assigning 
a feature value cluster weight 22 and a means tor assigning a leature weight 24 (not shown in FIG. 3). Feature value 
cluster weight assigning means 22 and feature value weight assignhg means 24 may be provided as specialized 
hardware, a separate microprocessor, the same microprocessor as the other means, or as a single microprocessor In 
a general purpose computer. 

FIG. 4 shows the Intemel system on which the preferred method and apparatus may be used. The sen/er 44 ts 
an apparatus as shown in FIG. 3. and it is preferred that server 40 displays a World Wide Web Page when accessed 
by a user via Internet 42. Senrer 40 also accepts input over the Internet 42. MuHiple users 44 may access senrer 40 
simultaneously. 

EXAMPLE 

The following example is one way erf using the invention, which can be used to recommend items in varbus domains 
for many Items. By way of example, a new user 44 accesses the system vfe the World Wide Web. The system displays 
a weteome page, which allows the user 44 to create an alias to use when accessing the system. Once the user 44 has 
entered a personal alias, the user 44 is asked to rate a number of items, in this example the items to be rated are 
recording artists in the music domain. 

Alter the user 44 has submitted ratings for various recording artists, the system allows the user 44 to enter ratings 
lor additional artists or to request reconrvncndatlons. I! the user 44 desires to enter ratings for additional artists, the 
system can provide a list of artiste the user 44 has not yet rated. For the example, the system can simply provide a 
random listing of artists not yet rated by the user 44. Alternatively, the user 44 can request lo rate artists that are slmlter 
to recording artists Ihey have already rated, and the system will provide a list of similar artists using the item similarity 
values prevtously cateulated by the system. The user can also request to rate recording artists from a parlfcular group, 
e.g. modem jazz, rock, or big band, and the system will provkJe the user 44 with a list of artists befonging to that group 
that the user 44 has not yet rated. The user 44 can also request to rate more artists that the user's 44 neighboring 
users have rated, and the system wrtll provide the user 44 with a list of artists by selecting artists rated by the user's 
44 neighboring users. 

The user 44 can request the system to make artist recommendations at any lime, and the system allows the user 
44 to tailor their request based on a number of diflerent factors. Thus, the system can recommend artists from various 
groups that the user's 44 neighboring users have also rated highly. Similarty. the system can recommend a predeter- 
mined number of artists from a particular group that the user will en)oy, e.g. opera singers. Altematlvely. the system 
may combine these approaches and recommend only opera singers thatthe user's neighboring users have rated htghly 

The syslem altows the user 44 to switch between rating items and receivhg recommendattonsmany times. The 
systam also provides a messaging function, so that users 44 may leave messages for other users that are not currently 
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using Ihe system. The system provides "chat rooms,' which allow users 44 to engage in conversation with other users 
44' that are currently accessing the system. These features are provided to allow users 44 to communicate with one 
another. The system facilitates user communication by informing a user 44 that another user 44' shares an interest in 
a particular recording artist Also, the system may inlorm a user 44 that another user 44 that shares an Interest In a 
particular recording artists is currently accessing the system, the system will not only inlomn the user 44, but will en- 
courage the user 44 to contact the other user 44* that shares the Interest. The user 44 may leave the system by leaving 
the Web Page. 

Having described preferred embodiments of the invention, it will now become apparent to one of skill in the art 
that other embodiments incorporating the concepts may be used. It is felt, therefore, that these embodiments should 
not b© limited to disclosed embodiments but rather should be limited only by scope of the following clainoB. 



Claims 

1. A method for recommending an item to one of a plurality of users, the item not yet rated by the user, the method 
comprising the steps of: 

(a) storing a user profile ^ a menwy for each of a plurality of users, wherein the user profile includes a plurality 
of values, each of at least some of the plurality of values representing a rating given to one o1 a plurality of 
items by the user; 

(b) storing an Item profile In a memory lor each of the plurality of Items, each of the plurality of Items belonging 
to one of a plurality of groups, wherein the item profile includes a plurality of values, each at least some of 
the plurality of values representing a rating given to the item by one of the plurality of users; 

(c) calcubting, for each of the plurality of users, a plurality of similarity factors, each of the plurality of similarity 
factors representing the similarity between each user and another of the plurality of users based on item ratings 
for a particular group; 

(d) selecting, for each of the plurality of users, a plurality of nei^boring users with respect to each group, the 
selection responsive to the similarity factors; 

(e) assigning a weight to each of the neighboring users; and 

(f) recommending an item to one of the plurality of users based on the weights assigned to the user's nel^- 
boring users and the ratings given to the unrated item by the user's neighboring users. 

2. The method of claim 1 wherein step (c) further comprises: 

(c-a) retrieving the item profile for one of the plurality of items that has been rated by one of the plurality of users; 
(c-b) determining, from the item's profile, other users that have also rated the rtem; 

(c-c) calculating a similarity factor between the one user and each of the plurality of other users that have also 
rated the Item; 

(c-d) repeating steps (c-a) through (c-c) until all items rated by the one user for the one group have been 
retrieved; and 

(c-B) repeating steps (c-a) though (c-d) untB similarity factors for each user have been calculated. 

3. The method ol claim 2 wherein step (c-c) further comprises calculating a similarity factor between the one user 
and each of the piurafrty of other users that have also rated the Hem, said simiferity factor based only on ratings 
for other Items belonging to the same group. 

4. The method of any preceding claim wherein step (d) conrprises selecting, for each of the plurality of users, a 
plurality of neighboring users in each group, each selected neighboring user having a similarity factor less than a 
predetermined threshold value. 

5. The method of claim 4 wherein step (e) further comprisss: 

subtracting, for each neighboring user, the similarity factor for that neighboring user from a predetermined 
threshold value and dividing each difference by the predetermined threshold value. 

6. The n^thod of any preceding claim whereh step (f) comprises: 

predicting a rating for each item in one of the plurality of groups not yet rated by one of the plurality of users 
by taking a weighted average of the ratings given to the items in the group by the one user's neighborhg users; 
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and . , 

recommending a predetermined number of items Irom the group based on the predicted ratings for those items. 
7. "mo method of any of claims 1 to 5 wherein step (f) comprises: 

receiving an item selection from one of the plurality of users; and 
. predicting a rating for the selected item by taking a weighted average of the ratings given to the selected item 
by the one user's neighboring users for the group. 

10 8. The nnethod of any of claims 1 1o 5 wherein step (0 comprises: 

receiving a group selection from one of the plurality of users; 

predicting a rating for items in the selected group by taking a weighted average of the ratings given to the 
items in the group by the one user's neighboring users for that group; and 
IS recommendhg a predetermhed number of items ri the group based on the predicted ratings for those items. 

9. The nnethod of any preceding claim further comprising the step of: 

(g) displaying information about recommended items on a display. 

20 10, The method of any preceding claim whereh an item belongs to multiple groups. 

11. A nwthod for recommending, to one of a plurality of uSers, other users, the method comprising the steps of: 

(a) storing a user profile in a memory for each of a plurality of users, whsrein the user profile includes a plurality 
2S of values, each of at least some of the plurality of values representing a rating given to one of a plurality of 

items by the user; 

(b) Eloring an item profile in a merTK>ry lor each of the plurality of items, each of the plurality of items belonging 
to one of a plurality of groups, wherein the Item profile Includes a plurality of values, each of at least some of 
the plurality of values representing a rating given to the item by one of the plurality erf users; 

30 (c) calcubting. tor each of the plurality of users, a plurality of similarity lactors, each of the plurality of similarity 

factors representing the similarity between each user and another one of the plurality of users based on the 
item ratings for a parlicubr group; 

(d) recommending at least one of the neighboring users to one of the plurality of users based on the similarity 
factors. 

12. The method of claim 11 wherein step (d) further comprises recommending at least one of the neighboring users 
to one of the plurality of users based on the similarity factors and the number o1 items rated by both the one user 
and the at least one neighboring user. 

40 13, A method for predicting a rating for an Item, the Item having at least one feature defining a characteristic of the 
item and having a pluralrty of possible values that nnay be grouped into clusters of feature values, the method 
comprising the steps of: 

(a) storhg a user profile in a memory for each of a plurality of users, wherein the user profile includes a plurality 
45 of values, each of at least some of the plurality of values representing a rating given to one d a plurality of 

items t>y the user; 

(b) storing an Item profile In a mennory for each of the plurality of items, wherein the item profile includes a 
plurality of values, each of at least some of the pluraiily d( values representing a rating given 1o the item by 
one of the plurality of users; 

BO (c) assigning, for each user, a weight to each value cluster within each feature; 

(d) assigning, for each user, a weight to each feature; 

(e) calculating, for each of the plurality ol users, a plurality of similarity factors, each of the plurality of similarity 
factors based on the feature weights, the cl uster weights, and the ratings given to items by the respective users; 

(f) selecting, lor each of the plurality of users, a pluralrty of neighboring users responshre to the similarity factors; 
55 (g) assigning a weight to each ol the neighboring users; and 

(h) recommending an item 1o one of the plurality of users based on the weights assigned to the user's neigh- 
boring users and the ratings given to the unrated rtem by the user's nelghborinig users. 
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14. The nnethod of claim 1 3 wherein step (c) further comprises: 

assigning, tor each user» a weight to each value cluster within each feature based on the rating given to the 
item by Ihe user and the number of feature value clusters present. 

15. The nnethod of claim 1 3 or 14 wherein step (d) further comprises: 

assigning, for each user, a weight to each feature based on the number of features defined for the Item. 

16. The method of claim 1 3 or 1 4 wherein step (d) iurther comprises: 

assigning, for each user, a weight to each feature based on the weights assigned to each feature val ue cluster. 

17. The method of claim 16, wherein the feature weight Is assigned by dividing the staridard deviation of the cluster 
weights by the mean of the cluster weights. 

18. The method of any of claims 1 3 to 17, wherein step (c) further comprises the steps of: 

(c-a) calculating, for each feature value, a similarity factor to all other feature values in a feature; and 
(c-b) grouping the feature values into clusters based on the similarity factors. 

19. A method for recommending an item to one of a plurality of users, the item having at least one feature defining a 
characteristic of the item and having a plurality oi possible values that may be grouped into clusters of feature 
values, the method comprising the steps of: 

(a) storing a user profile in a memory for each of a plurality of users, vyrtierein the user profile includes a plurality 
of values, each of at least some of the plurality of values representing a rating given to one of a plurality of 
items by the user, 

(b) storing an item profile in a menriory for each of the plurality of items, wherein the Item profile includes a 
plurality of values, each of at least some of the plurality of values representing a rating given to the item by 
one of the plurality of users; 

(c) assigning, for at least one user, a weight to each value cluster within each feature; 

(d) assigning, for the at least one user, a wei^t to each feature; 

(e) calculating, for each of the plurality of items, a plurality of similarity factors, each of the plurality of similarity 
factors based on the feature weights arKJ the cluster weig^ils; 

(f) selecting an item for which a favourable rating has been received from the at least one user, 

(g) selecting a plurality of items responsive to the selected item and the similarrly factors; 

(h) recommending at least one of the selected items to the at least one user. 

20. An article of manufacture having program means embodied therein, the program mearts for predicting a rating for 
an Item, the Hem having at least one feature defining a characteristic of the Item and having a plurality of possble 
values that n^ay be grouped into clusters of feature values, the article of manufacture comprising: 

computer-readable program means for storing a user profile in a memory for each of a plurality of users, 
wherein the user proffle Includes a plurality of values, each of at least some of the plurality of values repre- 
senting a rating given to one of a plurality erf items by the user; 

computer-readable program mear>s for storing an item profile in a memory for each of the plurality of items, 
wherein the Item profile Includes a plurality of values, each of at least some cl the plurality of values repre- 
senting a rating given to the item by one of the plurality of users; 

computer-readable pn:>gram means for assigning, for each user, a weight to each value cluster within each 
feature; 

computer-readable program means for assigning, for each user, a weight to each feature; 
computer-readable program means for calculating, for each of the plurality of users, a plurality of similarity 
factor©, each of the plurality of similarity factors based on the feature wei^te, the cluster weights, and the 
ratings given to items by the respective users; 

computer-readable program means for selecting, for each of the plurality of users, a plurality of neighboring 
users responsive to the similarity factors; 

computer-readable program means for assigning a weight to each of the neighboring users; and 
computer-readable program means for recommending an item to one of the plurality of users based on the 
weights assigned to the user's neighboring users and the ratings given to the unrated item by the user's nei^- 
boring users. 
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21, An apparatus for recommending an item, the item having al least one leature defining a characteristic of the Item 
and having a plurality at possible values that may be grouped into clusters of feature values, comprising: 

a memory element for storing user profiles, wherein each user profile hcludes a plurality oT values, each of at 
5 least some the plurality of values representing a rating given to one of a plurality of items by the user; 

a memory element for storing item profiles, wherein each item profile Includes a plurality of values, each of at 

least sonr^ of the plurality of values representing a rating given lo the item by one di the pluralrty of users; 

means for assigning, for each user, a weight lo each value cluster within each feature; 

mearis for assigning, for each user, a weight to each leature; 
to means for calculating, For each of the plurality of users, a plurality of similarity factors, each of the plurality of 

similarity factors based on the feature weights, the cluster weights, and the ratings given to Items by the 

respective users; 

means for selecting, for each of the plurality of users, a plurality of neighboring users responsive to the similarity 
factors; 

IS means for assigning a weigjit lo each of the neighboring users; and 

. means for recommending an item lo one of the plurality of users based on the weights assigned to the user's 
neighboring users and the ratings given to the unrated item by the user's neighboring userB, 
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